ggplot2ggplot2 packageggplot2 implements the Layered Grammar of Graphics, a system for building visualizations that is built around cases and variables.
library(ggplot2)
mpg
# A tibble: 234 × 11
manufacturer model displ year cyl trans drv cty hwy fl class
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
3 audi a4 2 2008 4 manu… f 20 31 p comp…
4 audi a4 2 2008 4 auto… f 21 30 p comp…
5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
# … with 224 more rows
ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy))
aesThere are different types of aesthetics, which change the properties of the objects that are drawn:
colorshape: list of availablesizealphaggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy, size = cyl, color = class))
Example:
- In the previous graphic, make all the dots blue (
color="blue")
Inside of aes(): it is considered to be in the data space. Therefore, a transformation is applied to represent it.
Outside of aes(): ggplot2 treats input as value in the visual space and sets the property to it.
ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy, color = "blue"))
ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy), color = "blue")
Example:
- In the above chart, make all the points with displ <5 draw in one color and those with displ> = 5 in another.
ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy, color = displ < 5))
facetSubplots that display subsets of the data.
facet_wrap(): facets depending on a single discrete variablefacet_grid(): facets according to 2 variables (\(\texttt{rows} \sim \texttt{columns}\), \(.\) for no split)ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy)) +
facet_wrap(~ class)
ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy)) +
facet_grid(drv ~ cyl)
Example:
- Check what happens when
.is used instead of one of the variables in the formula insidefacet_grid().
ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy)) +
facet_grid(drv ~ .)
ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy)) +
facet_grid(. ~ cyl)
geomWhat is the difference between these two graphics?
ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy))
ggplot(data = mpg) +
geom_smooth(aes(x = displ, y = hwy))
geom: is the geometric object that a graph uses to represent data
Each function geom has an argument aes, although not any aes works with each geom (reference). For example, you can define shape of a geom_point but not a geom_line.
Example:
- Make 3 different figures from the previous figure using the variable
drvwith the aestheticscolor,linetypeandgroup.
ggplot(data = mpg) +
geom_smooth(aes(x = displ, y = hwy, color = drv))
ggplot(data = mpg) +
geom_smooth(
aes(x = displ, y = hwy, linetype = drv)
)
ggplot(data = mpg) +
geom_smooth(aes(x = displ, y = hwy, group = drv))
Each new geom adds a new layer to the graph.
ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy)) +
geom_smooth(aes(x = displ, y = hwy))
Mappings and data included in ggplot() will be applied globally to all layers.
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
Mappings and data included in a geom_() function will overwrite global conditions only for this layer.
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_smooth()
Each layer can be associated with different data.frames.
It is mandatory to specify the data parameter in those geometries that use a different dataset than the one that appears inggplot().
mpg_subcompact <- mpg[mpg$class == "subcompact", ]
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_smooth(data = mpg_subcompact, se = FALSE)
Example:
- Recreate the code in R needed to generate the following figures, from:
p <- ggplot(data = mpg, aes(x = displ, y = hwy))
p <- ggplot(data = mpg, aes(x = displ, y = hwy))
p + geom_point() +
geom_smooth(se = FALSE)
p + geom_point() +
geom_smooth(aes(group = drv), se = FALSE)
p + geom_point(aes(color = drv)) +
geom_smooth(aes(color = drv), se = FALSE)
p + geom_point(aes(color = drv)) +
geom_smooth(se = FALSE)
p + geom_point(aes(color = drv)) +
geom_smooth(aes(linetype = drv), se = FALSE)
p + geom_point(aes(color = drv)) +
geom_smooth(aes(group = drv), se = FALSE)
It’s about how the chart orders the overlapping geoms.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, colour = cut))
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut))
If a variable is used in aes(fill) it automatically shows stacked bars. This is done by the position setting.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity))
Example:
- Modify the above graphic using different values in the
positionparameter (“stack”, “dodge”, “identity”, “fill”).
p <- ggplot(diamonds, aes(x = cut, fill = clarity))
p + geom_bar()
p <- ggplot(diamonds, aes(x = cut, fill = clarity))
p + geom_bar(position = "stack")
p + geom_bar(position = "dodge")
p + geom_bar(position = "identity")
p + geom_bar(position = "fill")
ggplot2p <- ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_smooth()
p +
labs(title = "Fuel efficiency vs. Engine size",
x = "Engine displacement (L)",
y = "Highway fuel efficiency (mpg)",
color = "Type of Car",
caption = "Data from fueleconomy.gov")
Normally ggplot2 adds scales automatically (scale_ + name of the aesthetic + _ + name of the scale)
(p <- ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)))
p +
scale_x_continuous() +
scale_y_continuous() +
scale_color_discrete()
p +
scale_color_discrete(labels = c("A" , "B", "C", "D", "E", "F", "G"))
The scales of the axes can be modified:
p +
scale_x_continuous(labels = NULL) +
scale_y_continuous(breaks = seq(15, 40, by = 5))
p +
scale_y_log10(breaks = seq(15, 40, by = 5))
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_smooth() +
coord_cartesian(xlim = c(5, 7), ylim = c(10, 30))
You can change the appearance of elements that do not come from the data with theme_().
p + theme_bw()
p + theme_grey()
p + theme_light()
p + theme_dark()
Make all figures generated with ggplot2 use the same theme:
theme_set(theme_bw())
There is a package with additional themes: ggthemes
library(ggthemes)
p <- ggplot(mpg, aes(x = displ, y = hwy, colour = factor(cyl))) +
geom_point() +
labs(title = "mpg")
# Economist theme
p + theme_economist()
# Economist theme + color palette
p + theme_economist() + scale_colour_economist()
It is also possible to define own themes.
theme_jesus <- function () {
theme_bw(base_size=12, base_family="Courier") %+replace%
theme(
panel.background = element_blank(),
plot.background = element_rect(fill="gray96", colour=NA),
legend.background = element_rect(fill="transparent", colour=NA),
legend.key = element_rect(fill="transparent", colour=NA)
)
}
p + theme_bw()
p + theme_jesus()
Exercise:
- Experiment with labels, themes and scales in order to create a figure like this, based on the \(\texttt{diamonds}\) data (
x: \(\texttt{carat}\),y: \(\texttt{price}\),color: \(\texttt{cut}\)):
ggplot(diamonds, aes(x = carat, y = price)) +
geom_point() +
geom_smooth(aes(color = cut), se = FALSE) +
labs(title = "Ideal cut diamonds command the best price for ever carat size",
subtitle = "Lines show GAM estimate of mean values for each level of cut",
caption = "Data provided by Hadley Wickham",
x = "Log Carat Size",
y = "Log Price Size",
color = "Cut Rating") +
scale_x_log10() +
scale_y_log10() +
scale_color_brewer(palette = "Greens") +
theme_light()
ggsave saves the last generated plot in the working directory:
ggsave("my-plot.pdf", width = 6, height = 6)
ggsave("my-plot.png", width = 6, height = 6)
Another way:
png("my-plot_4.png", width = 800, height = 600)
print(p)
dev.off()
quartz_off_screen
2
Plotly: is a Javascript-based graphics library that generates interactive graphics.
It can only be used to generate HTML-based documents.
library(plotly)
(p <- ggplotly(p))
library(gridExtra)
p1 <- ggplot(diamonds, aes(x = carat, y = price)) +
geom_point()
p2 <- ggplot(diamonds, aes(x = carat, y = price)) +
geom_smooth(aes(color = cut), se = FALSE)
grid.arrange(p1, p2, nrow = 1)